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We give a method for the correction of confidence intervals when the original interval 
does not have the correct nominal coverage probabilities in the frequentist sense. Our 
method is general and does not require any distributional assumptions. It can be applied 
to both frequentist and Bayesian inference where interval estimates are desired. We pro- 
vide theoretical results for the consistency of our proposed estimator, and provide two 
complex examples, on confidence interval correction for composite likelihood estimators 
and in approximate Bayesian computation, to demonstrate the wide applicability of our 
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-t— > 

Keywords: Approximate Bayesian computation; Confidence interval correction; Compos- 
| ite likelihood; Coverage probability. 

> 

in 

O 1 Introduction 

Interval estimates are typically intended to have a specified level of coverage. This is 
true, for example, of both frequentist confidence intervals and Bayesian credible inter- 
ns) vals. However, for many problems the coverage of a confidence or credible interval will 
only equal its nominal value asymptotically, and coverage can be poor even for quite 
J> large samples in some situations. In many complex problems, there can be inherent bias 
which can be difficult to quantify or calculate. This can arise, for example, in composite 
likelihood problems ( Varin et al. 2011) and approximate Bayesian computation (Sisson 
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and Fan 2011) . In this paper we propose a novel procedure for adjusting interval esti- 



mates that has wide application and will typically reduce the bias in their coverage. 

The procedure assumes that the mechanism that generated the sample data could be sim- 
ulated if population parameters were known. These parameters are estimated by sample 
statistics derived from real data, and then pseudo-samples are drawn from the estimated 
population distribution. From each pseudo-sample a confidence/ credible interval is de- 
termined for the quantity of interest. The frequentist bias in these intervals is calculated 
and then used to adjust the interval estimate given by the real data. 
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The method has similarities to the double-bootstrap (Da vison and Hinkley 1997} , in 



which a bias correction is applied to a bootstrap interval by re-sampling from a boot- 
strap distribution. A difference in our method is that it involves only one level of sample 
generation, which makes it computationally less demanding, although the saving may 
dissipate if computationally demanding methods (such as Markov chain Monte Carlo) 
are used to obtain the interval estimates from sample data. 

In some of our examples, we apply our procedure to reduce bias in the coverage of 
Bayesian credible intervals. This may not seem intuitive, since coverage is a frequentist 
property while a Bayesian interval may reflect personal probabilities. However, there 
are many situations where posterior distributions should preferably be well calibrated. 
These include inference with objective or probability matching prior distributions, the 
verification of Bayesian simulation software ( Cook et al. 2006|> and techniq ues and diag 



nostics in likelihood-free Bayesian inference (Fearnhead and Prangle 2012 [PFangle et aT 



2012). 



In Section [2] we describe the proposed method and give theoretical results related to it, 
illustrating them through simulated examples. In Section [3] we apply our method to 
two more complex, real analyses. One of these involves estimation with composite like- 
lihoods, which is known to produce confidence intervals that are too narrow, and the 
other involves approximate Bayesian computation, which typically gives larger poste- 
rior credibility intervals than desired. Some concluding comments are given in Section 
® 



2 Coverage correction for confidence intervals 

Suppose we are interested in estimating an equal-tailed 100(1 — a)% confidence interval 
for some parameter 9 € C TZ. Thus for observed data x, we seek an estimate L(x), such 
that 

P(9 < L(x)) = a/2, 

where L(x) denotes the lower limit of the interval. Similarly for the upper limit, we seek 
an estimate U(x), such that, 

P{6 > C/(x)) = a/2. 

In the frequentist setting, the expressions above are written in terms of pivotal functions 
of the data, x, and parameter, 9, since the parameter 9 is considered a fixed quantity. In 
the Bayesian setting, credibile intervals are based on the posterior distribution of 9. In an 
abuse of notation, we will use the above notation in both cases. For a given 9, coverage 
probabilities can be estimated by simulating from the data model /(x|0) multiple times, 
constructing the relevant intervals, and then counting the proportion of intervals which 
contain 9. 



We will assume that the method of obtaining the lower and upper intervals L(x) and 
U (x) is known, but this can be generic. We do not assume that these estimates produce 
the correct coverage probability. We also assume that the population parameters can be 
well approximated from the data. Our goal is to provide a method that gives adjustments 
that improve this coverage. We first give theoretical results for the proposed methodol- 
ogy, and then give details of its implementation. 
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2.1 Theoretical results 



Assumption 1 We suppose that the observed data x come from the model given by /(x|0), 9 € 
0. For any 9 e 0,we assume that it is possible to simulate from f(-\9). 

Assumption 2 Given 9 and data x ~ f (x|0) there exists a consistent estimator 6 of 6. 

Assumption [T] requires that we are able to simulate replicate data from the model given 
the values of the parameters. Assumption [2] requires that we have a good estimator for 
6, so that interval estimates obtained using 6 converge to those estimates obtained using 
the population parameter 9, as the amount of data gets large. 

In the following, we only require the lengths of the intervals to be consistent. Conse- 
quently, Assumption 2 is not always necessary. For example, if 9 represents a location 
parameter whose confidence interval has a length that is independent of 9 (see later ex- 
ample). In the frequentist setting, the maximum likelihood estimator of 9 is consistent 
and unbiased in many finite sample situations. In the Bayesian setting, the posterior dis- 
tribution is consistent under mild assumptions, and the posterior mean estimate of 9 is 
asymptotically unbiased. However, in both cases, finite sample bias in 9 may render our 
method less accurate. 

Theorem 2.1 For some 9 and x ~ /(x|0), let L(x) be an estimator of the lower limit of a 
100(1 — a)% level confidence interval, and suppose that 

P{9 < L(x)} + a/2. 

Let Gr^ry denote the distribution function of a random variable W. Consider the new estimator 

L c (x) = L(x) + ^ /2 , (1) 

where £ Q / 2 zs the a /2-th quantile of the distribution function G^^^^y, so that G{6»-l(x)} (£0/2) = 
a/2. Then the new estimator, L c (x), will have the correct coverage probability 

P{9 < L c (x)} = a/2. 

Proof: See Appendix. 

From the above theorem, it can then be seen that for the estimator of the upper limit of a 
100(1 — a)% confidence interval, U(x), we can write 

C/ c (x) = C7(x) + e W2 , (2) 

where £i_ Q /2 is the (1 — a/2)-th quantile of the distribution function G^u^y, so that 
G{0-E/( x )}(£i-a/2) = 1 — a/2. In this case, we then have that 

P{9 > C/ C (x)} = a/2. 
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Theorem 2.2 For some 9 and observed data x ~ /(x|0), suppose the lower limit of a 100(1— a)% 
confidence interval L(x) can be obtained, and that this estimate does not necessarily give the 
correct coverage probability. Suppose that 9 G TZ is a consistent estimator of 9, evaluated using 
the data x. Let yi, . . . , y n be n replicate datasets simulated independently from f(-\9), and denote 
the corresponding lower confidence limits by Li(yi), . . . , L„(y n ), obtained in the same manner 
as L(x). Define 

1 n 

^{0-L(y)}( £ ) = ~X, J {0-£i(yi)<e} 
i=l 

as the empirical distribution of9 — L(y) based on the observed values of9 — Li(yi),i = 1, . . . , n. 
If we define 

L c (x) = L(x)+| a/2 (3) 

where £ a / 2 = G~~ L( , ^(a/2), then £ c (x) is a consistent estimator o/L c (x), as defined in Equa- 
tion^. 



Proof: See Appendix. 



In combination, Theorems 2.1 and 



2.2 



state that if 

and subsequently obtain the confidence limits -Li(yi), . . . , L n (y n ) in the same way as for 
the original data x, then we can correct the bias in the original lower limit estimate, L(x), 
by addition of the a/2-th sample quantile of 9 — Li(yi), . . . ,9 — L n (y n ). 



Corollary 1 Under the assumptions in Theorem 2.2 a central limit theorem holds for L c (x) 



Specifically, for all a G (0, 1), 9 G TZ and x ~ f(x\6), we have that 

^(L c (x) - L c (x))G' {e _ L(x)} (£ Q ) — > N(0, a(l - a)) 
as n — > oo, w/zere G,^^^!^) = ^G{ e _ L ( x )}(^), and ^ Q zs the a-th quantile ofG^ e _ L ^y 



Proof: The result follows immediately from Equation |7|) of the proof for Theorem 2.2 
(see Appendix). 



The above theoretical results provide a simple way of estimating corrections to the lower 
and upper confidence limits that will produce the correct nominal coverage probability. 
In addition, these estimators are consistent and asymptotically normal. 



2.2 Correction procedure 

In summary, the correction algorithm has the following steps: 

Step 1 Obtain L(x) and U(x), the upper and lower limits of the desired 100(1 — a)% confidence 
interval for the parameter 9, for an observed dataset x. 

Step 2 Evaluate 9 and generate n independent datasets yi , . . . , y n ~ f(y\9) from the model. 

Step 3 For each dataset yi, compute the 100(1 — a)% lower and upper confidence limits, Lj(yj) 
and Ui(yi),for the parameter 9, using the same method as in Step 1. 
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Step 4 Set the corrected lower and upper limits to 

L c (x) = L(x) + G-/_ L(y)} («/2) 
C/ C (x) = C/(x) + G'-/_ [/(y)} (l-a/2) 

wfoere G^^a) denotes the a-th sample quantile of the random variable W. 
2.3 Simple examples 

We illustrate the above procedure with two simple examples. In the first, we consider 
confidence interval correction for the mean parameter of a normal distribution with known 
variance. In the second example, it is assumed that the mean is known and that we are 
interested in the variance parameter. 

Example 1: Normal distribution with known variance 

Suppose that 9 is the location parameter of a Normal distribution with unit variance, so 
that Xi ~ N(9, 1) where x = (x%, . . . , x m ). In this case, the maximum likelihood estimator 
is 6 = x = J2i x i/ m - F° r illustration, we suppose that the confidence interval we ob- 
tain for 9 does not have the correct coverage, in that we obtain the equivalent confidence 
interval when data are generated from xi ~ N(9, (1 + e) 2 ) with e > 0. The value of e 
controls the amount of error in the coverage probability. Following the usual frequentist 
approach, the 100(1 — a)% confidence interval for 9 is given by L(x) = x — z a / 2 (1 + e) / y/m 
and U (x) = x+zi_ a / 2 (l+e) I \P ra , where z a is the a-th quantile of the standard normal dis- 
tribution. Clearly the correction for the interval when e > is L c (x) = L(x) + z a j 2 ej ^pm 
and C/c(x) = U(x) - z 1 _ a/2 e/^ [ /m. 

Figure [T] displays the results of the correction procedure for a 95% confidence interval 
based on 100 replicate analyses. Each analysis is based on samples of size m = 20 with 
9 = 0, so that x\,. . . ,x m ~ iV(0, IV, and n = 100 replicated samples yi, • • • , y n with ele- 
ments drawn from N(9, 1). Figure HI (top plots) illustrates the corrected confidence limits 
L c (x) and f/ c (x) for a range of error term values, e. Clearly the correction produces an 
unbiased adjustment, as the boxplots are centred on the true confidence bounds (the hor- 
izontal line) in each case. Further, the performance of the method produces qualitatively 
the same corrected interval limits, irrespective of the value of e. 

The bottom plots display the corrections L c (x) and f/ c (x) with e = 1 fixed, for a range of 
values of 9. For this example, choosing 9 to be any arbitrary value will result in the same 
quality of unbiased correction. This arises as the distributions of 9 — L(y) and 9 — U(y) 
do not change with 9, so that the confidence intervals all have the same width as 9 varies. 
As this is a location parameter only analysis, this is one case where Assumption 2 is not 
required to produce a consistent adjustment (see Section |2T|. 

Example 2: Normal distribution with known mean 

Suppose now that 9 is the scale (variance) parameter of a Normal distribution with mean 
zero, so that xi ~ N(0,9). Here we specify 9 = S 2 = Y^iL\{ x i ~ %) 2 as the sam- 
ple variance. In this setting, suppose that the regular confidence limits for 9 are biased 
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Figure 1: Lower £ c (x) and upper t/ c (x) corrected confidence limit estimates for 100 repli- 
cated analyses for the normal location model. Top plots show the corrected limits for e = 
20, 13.33, 6.67, with 9 = x. Bottom plots show the corrected limits for 9 = -2, -1, 0, 1, 2 with 
e = 1. The horizontal lines represent the 0.025-th (left plots) and 0.975-th (right plots) percentiles 
of a standard normal distribution. 
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downwards by a constant value e > 0. Specifically, the 100(1 — a)% confidence interval 
for 9 is given by L(x) = ^ m ^^ s e arK j C/(x) = (^-1)5 ^ w ]- iere ^2 denotes t ne 

^l-a/2;m-l -^c«/2;m-l 

a-th percentile of a x| distribution with k degrees of freedom. 

Figure [2] shows the results of the correction procedure for the lower limit of a 95% con- 
fidence interval based on 100 replicate analyses. Each analysis uses samples of size m 
with 9 = 1, so that xi, . . . , x m ~ N(0, 1), and n = 2000 replicated samples yi, . . . ,y n with 
elements drawn from N(0, 9). Figured (left panel) illustrates the corrected lower confi- 
dence limit, L c (x), based on a sample of size m = 20, for a range of fixed values of 9. The 
extreme left and right boxplots correspond to the raw biased (L(x)) and true unbiased 
(L c (x)) limits respectively. Clearly, as 9 changes, then so does the location of the adjusted 
limits. This occurs as, in contrast with the above example, the distributions of 9 — L(y) 
and 9 — U (y) clearly do change with 9. When 9 = 9 = 1, then the correction procedure 
produces the correct adjusted limits, as indicated by the rightmost boxplot. Hence, it is 
necessary to use the right value for 9 when making the correction. 

Assumption 2 requires that 9 is a consistent estimator of 9. Hence we can be sure that 
9 — > 9 as m — > oo, and as a result that the distribution of 9 — L(y) approaches that of 
9 — L(x), so that our correction procedure will perform correctly for large enough m. In 
practice, the required value of m can be moderate. Figure [2] (right panel) shows how the 
correction error, L c (x) — L c (x), varies as a function of m. Clearly, the median error is 
close to zero even for small sample sizes. However, there is some asymmetry for small 
m, which is also visible in the left panel (e.g. compare the differences in the bias in the 
boxplots with 9 = 0.4 and 9 = 1.6), although this is eliminated as m increases. 



3 Real examples 

We now consider interval estimation in two real, complex modelling situations. The 
first is an application of composite likelihood techniques in the modelling of spatial ex- 
tremes. With composite likelihoods, deriving unbiased confidence intervals can require 
a large amount of algebra, whereas biased intervals that are typically too narrow are 
easily computable. The second is an application of approximate Bayesian computation 
(ABC) methods in the modelling of a time series of g-and-k distributed observations. In 
most practical settings the mechanism behind the model fitting process within the ABC 
framework typically gives posterior credible intervals that are too large. 



3.1 Spatial extremes via composite likelihoods 



In the context of analysing spatial extremes, Padoan et al. (2010) developed a pairwise 
composite likelihood model, for inference using max-stable stationary processes. Specif- 
ically, for m annual maximum daily rainfall observations, at each of K spatial locations, 
the pairwise composite likelihood was specified as 

^c(%) = ^^ilog/(x i ,x i |0) 

i<j 

where x = (xi, . . . , xk) and Xj = (xa, . . . , Xj m ), /(xj, Xj\9) is a known bivariate density 
function with parameter vector 9 evaluated at spatial locations i and j, and Wij > are 
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Figure 2: Left panel: Boxplots of the corrected lower confidence limit, L c (x), when holding 9 fixed at 
various values 9 = 0.4, 1.6 (true value is 9 = 1). Leftmost and rightmost boxplots correspond to the 
biased (L x ) and true unbiased (L c (x)) lower limits respectively. Right panel: Boxplots of the correction 
error Z c (x) — i(x) as a function of observed data sample size m. All boxplots are based on 100 replicate 
analyses. 
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weights such that Ylij w ij = !• Under the usual regularity conditions, the maximum 
composite likelihood estimator, 9 can provide asmptotically unbiased and normally dis- 
tributed parameter estimates when standard likelihood estimators are unavailable (e.g. 
Varin et al. 2011) . 



Specifically, we have (e.g. |Huber 1967) that ~ N(0, 1~ l (0)), with 

1(0) = H(0)J(0)- 1 H(0). (4) 

where H{0) and J(0) are respectively the expected information matrix and the covariance 
matrix of the score vector. In the ordinary maximum likelihood setting, H{0) = J(0). In 



the max-stable process framework, Padoan et al. (2010) provided an analytic expres 



sion for J (6) for a particular (Gaussian) spatial dependence model. Combined with the 
standard numerical estimates of H(0), this allowed for the construction of standard confi- 
dence intervals for 0. However, for composite likelihood techniques in general, obtaining 
analytic expressions or numerical estimates of J{0) can be challenging, whereas estimates 
of H(0) are readily available. In this example, we demonstrate how our proposed method 
can be employed to correct the too narrow confidence intervals that result from using 
1(0) = H(0). We then compare our results with those derived from the known maximum 
composite likelihood information matrix d4). 



We considered four spatial models for stationary max-stable processes that describe dif- 
ferent degrees of extremal dependence, with parameter inference based on m = 100 ob- 
servations at each of K = 50 randomly generated spatial locations. Each model expresses 
the degree of extremal dependence via the covariance matrix 

where the values for each parameter for each model M\ , . . . , M4 are given in Table [l] 
Model M4 has an additional non-stationary spatial component that is modelled by the 
marginal parameters fi, A and £ (corresponding to location, scale and shape parameters) 
through the response surface 

fj, = ao + ot\ * lat + 02 * Ion 
X = /3q + /3i * Ion 
£ = 7o, 

where lat and Ion denote latitude and longitude coordinates. 



Model 


°\ 


C12 


a\ 


Mi, M A 


9/8 





9/8 


M 2 


2000 


1200 


2800 


M 3 


25 


35 


14 



Table 1: Covariance matrix configurations for models M\ , . . . , M4 for the extremal spatial depen- 
dence analysis. 

Tables [2] and [3] summarise the empirical coverage probabilities for nominal 95% confi- 
dence intervals for models Mi , . . . , M4 based on 500 replicate analyses. Columns C\ pro- 
vide the interval coverage using the standard Hessian matrix 1(0) = H(0), and columns 
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Ci provide the same using the composite likelihood information matrix Q following 



Padoan et al. (2010) Columns C3 correspond to our correction procedure when applied 
to the intervals in column C\ using the standard Hessian matrix. For the correction proce- 
dure we used = 0, the maximum composite likelihood estimate, and n = 500 simulated 
datasets to perform the adjustment. 

From Table |2j clearly confidence interval coverage based on the standard Hessian matrix 
(Ci) is too low. The coverage using the sandwich information matrix (C2) is very good, 
with all the reported values close to 0.95. The coverage values obtained using our adjust- 
ment procedure, which is based on the intervals in column C\, are also very close to 0.95, 
and mostly closer than with the sandwich information matrix. Similar results are ob- 
tained for model M4 in Table[3j Taken together, these results indicate that our adjustment 
procedure can successfully modify the upper and lower limits of a confidence interval to 
achieve comparable results to established methods in complex settings. However, it does 
not make use of the algebraic representation of J(Q) in this case, and so is more easily 
extended to alternative models (e.g. where J(0) is not available), albeit at a moderate 
computational cost. 





Mi 

C\ C*2 C3 


M 2 

C\ C*2 C3 


M 3 

C\ C*2 C3 


C12 
a\ 


0.428 0.960 0.960 
0.518 0.940 0.960 
0.468 0.930 0.936 


0.098 0.947 0.950 
0.122 0.955 0.956 
0.092 0.955 0.938 


0.092 0.939 0.944 
0.154 0.921 0.960 
0.102 0.940 0.952 



Table 2: Empirical coverage probabilities for 95% confidence intervals of the parameters of mod- 
els Mi , Al 2 , and M3 based on 500 replicate analyses. Columns indicate interval confidence estima- 
tion methods using: (C\) the standard Hessian matrix 1(6) = H(0); (C2) the sandwich information 
matrix 1(6) = H(6)J- 1 (8)H(6); and (C 3 ) the standard Hessian matrix 1(6) = H(6) followed by 
our correction procedure. 







C12 


4 


a 


a\ 


02 


A> 


Pi 


7o 


Ci 


0.372 


0.544 


0.388 


0.114 


0.122 


0.098 


0.108 


0.140 


0.104 


c 2 


0.930 


0.925 


0.945 


0.935 


0.945 


0.945 


0.935 


0.940 


0.910 


c 3 


0.924 


0.924 


0.958 


0.950 


0.940 


0.944 


0.944 


0.952 


0.952 



Table 3: Empirical coverage probabilities for 95% confidence intervals of the parameters of model 
M4 based on 500 replicate analyses. Columns indicate interval confidence estimation methods 
using: (C{) the standard Hessian matrix 1(6) = H(8); (C2) the sandwich information matrix 
1(6) = H(8)J- 1 (6)H(6); and (C 3 ) the standard Hessian matrix 1(6) = H(6) followed by our 
correction procedure. 



3.2 Exchange rate analysis using approximate Bayesian computation 

Approximate Bayesian computation (ABC) describes a family of methods of approximat- 
ing a posterior distribution when the likelihood function is computationally intractable, 



but where sampling from the likelihood is possible (e.g. Beaumont et al. 2002, Sisson and 



Fan 2011). These methods can be thought of as constructing a conditional density esti 



mate of the posterior ( [Blum 2010] ), where the scale parameter, h > 0, of the kernel density 



10 



function controls both the level of accuracy of the approximation, and the computation 
required to construct it. Lower h results in more accurate posterior approximations, but 
in return requires considerably more computation. As such, moderate values of the scale 
parameter are often used in practice. Accordingly, this typically results in oversmoothed 
estimates of the posterior, and in turn, too wide credible intervals. 

We consider an analysis of daily exchange rate log returns of the British pound to the Aus- 



tralian dollar between 2005 and 2007. Drovandi and Pettitt (2011) developed an MA(1) 



type model for these data where the individual log returns were modelled by a g-and-k 



distribution (jRayner and MacGillivray 2002 1. The g-and-k distribution is typically de- 



fined through it's quantile function 



l-exp{-gz(p))\ o , _ 



Q(z(p); 9) = a + 6 1 + c " " " (1 + zipffzip), (5) 
V 1 + exp{-gz{p))J 

where 6 = (a,b,g,k) are parameters controlling location, scale, skewness and kurtosis, 
and z{p) is the p-quantile of a standard normal distribution. The parameter c = 0.8 is 



typically fixed. We used the sequential Monte Carlo-based ABC algorithm in |Drovandi 



and Pettitt (2011)} based on 2,000 particles, to fit the MA(1) model. The data-generation 
process, used in both ABC and our correction procedure, consists of drawing dependent 
quantiles Z{ = (r/j + at]i-i)/ y/i + a 2 for i = 1, . . . n, where r\i ~ N(0, 1) for i = 0, . . . , n, 
and then substituting zip) = z% in (pj. 

Table [4] shows the estimated 95% central credible intervals, and their widths, for each 



model parameter based on the ABC kernel scale parameter h = 0.016 (following Drovandi 



and Pettitt 201 1| | and also the lower value of ft. = 0.009. Also shown are the inter- 



vals obtained after performing a local-linear, ridge regression-adjustment (Blum et al. 



2012} Beaumont et al. 2002[ ) on the posterior obtained with h = 0.016. The regression- 



adjustment is a standard ABC technique for improving the precision of an ABC posterior 
approximation, which aims to estimate the posterior at h = based on an assumed re- 
gression model. 

Clearly the parameter credible intervals obtained with h = 0.009 are narrower than those 
obtained with h = 0.016, indicating that the larger intervals indeed have greater than 95% 
coverage. The regression-adjusted intervals generally have widths somewhere between 
the intervals constructed with h = 0.016 and h = 0.009. The suggestion from Table [4] is 
that even narrower (i.e. more accurate) credible intervals may result if it were possible to 
reduce h further. 



Table [5] shows the corrected 95% central credible interval estimates, obtained from the 
ABC posterior approximations with kernel scale parameter h = 0.016, 0.02 and 0.03. The 
correction was based on n = 500 simulated datasets and using the posterior mean as the 
estimate 9 of 9. The results of the correction across the three kernel scale parameter values 
are similar, suggesting potential computational savings in the ABC posterior simulation 
stage, as one may perform the analysis with larger values of h. All parameters achieve 
equivalent or improved precision compared to the most precise ABC posterior estimate 
obtained with h = 0.009. 
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While for a standard Bayesian analysis, the posterior mean is a consistent estimator of 9, 
this may not be true in the case of the ABC approximate posterior for h > 0, as the location 
and shape of the ABC posterior can change with h. As such, some care may be needed 
when implementing our correction procedure in this setting. In the current analysis, a 
preliminary investigation suggested that estimates of the posterior mean stabilised below 
h = 0.03. As such, we are confident that this represents an accurate estimate 9 of 9 in this 
case. 





h = 0.016 


Width 


h = 0.009 


Width 


Reg. Adj. (h = 0.016) 


Width 


a 


(-0.0006, 0.0002) 


0.0008 


(-0.0004, 0.0001) 


0.0005 


(-0.0006, 0.0002) 


0.0008 


b 


( 0.0018, 0.0028) 


0.0010 


( 0.0019, 0.0026) 


0.0007 


( 0.0018, 0.0027) 


0.0009 


9 


(-0.0267, 0.2573) 


0.2840 


(-0.0044, 0.2138) 


0.2182 


(-0.0286, 0.2505) 


0.2791 


k 


( 0.2024, 0.5061) 


0.3037 


( 0.2607, 0.5322) 


0.2715 


( 0.2148, 0.5092) 


0.2944 


a 


( 0.1413, 0.2713) 


0.1300 


( 0.1491, 0.2771) 


0.1280 


( 0.1489, 0.2742) 


0.1253 



Table 4: 95% central credibile intervals and corresponding interval widths from the g- 
and-/c distribution MA(1) model. Results obtained using ABC posterior approximation 
with kernel scale parameter h = 0.016 and h = 0.009, and following a ridge regression- 
adjustment based on an ABC posterior approximation with h = 0.016. 





h = 0.016 Width 


h = 0.02 Width 


h = 0.03 Width 


a 
b 

9 
k 
a 


(-0.0003, 0.0000) 0.0003 
( 0.0020, 0.0024) 0.0004 
( 0.0303, 0.2156) 0.1853 
( 0.2769, 0.4099) 0.1330 
( 0.1430, 0.2659) 0.1229 


(-0.0003, 0.0000) 0.0003 
( 0.0021, 0.0025) 0.0004 
( 0.0173, 0.1818) 0.1645 
( 0.2909, 0.4235) 0.1326 
( 0.1335, 0.2708) 0.1373 


(-0.0004, -0.0001) 0.0005 
( 0.0021, 0.0024) 0.0003 
( 0.0204, 0.1957) 0.1753 
( 0.2768, 0.4129) 0.1362 
( 0.1513, 0.2714) 0.1201 



Table 5: Adjusted 95% central credibility intervals and corresponding interval widths 
from the g-and-k distribution MA(1) model. Adjusted intervals based on correcting ABC 
posterior approximations with kernel scale parameter h = 0.016, 0.02 and 0.03. 



4 Discussion 

In this article we have introduced a method of adjusting confidence interval estimates to 
have a correct nominal coverage probability. This method was developed in the frequen- 
tist framework, but may be equally applied to ensure that Bayesian credible intervals 
possess the (frequentist) coverage property. Our approach is general and makes mini- 
mal assumptions: namely that it is possible to generate data under the same procedure 
(model) that produced the observed data, and that a consistent estimator is available for 
the parameter of interest. The correction is asymptotically unbiased, although can work 
well for moderate sample sizes (m), and there is a central limit theorem for the corrected 
interval limits in terms of the number (n) of auxiliary samples used to implement the 
correction. 
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In the examples that we have considered, we have found that our method can produce 
confidence intervals which perform comparably to existing gold standard approaches 
- though with greater scope for extension to more complicated models - and provide 
a reliable method of adjusting approximately obtained credible intervals in challenging 
settings. 

One potential criticism of our approach is that it requires the construction of a large 
number (n) of confidence or credible intervals in order to correct one interval. In the 
case where constructing a single interval is computationally expensive, implementing 
the correction procedure in full can result in a large amount of computation. This was 
the case in our exchange rate data analysis using ABC methods, although using an al- 
ternative ABC algorithm such as regression-adjustment (based on a single large number 
of model simulations) would have been more efficient. While regression-adjustment can 
itself perform poorly if the assumed regression model is incorrect, as our correction pro- 
cedure makes minimal assumptions, we may still have good confidence, however, in the 
resulting adjusted intervals it provides. 
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Appendix: Proofs 

Proof of Theorem 12.11 

Writing L c (x) = L(x) + 5, then 

P(9<L c (x)) = P(0<L(x) + <5) 

= P{9-L(x)<S). 

Hence, by definition, P(8 < L c (x.)) = a/2 if 5 = £a/2/ where £ a / 2 is the a/2-th quantile of 
the distribution of 9 — L(x). 



Proof of Theorem 12.21 

Let G{e_i( x )} be the distribution function of 9 — L(x), which has positive first derivatives 
so that G'r e _ L r.y(u) = ^G ? {6)_l(x)}(^) > for all f£i Also let G^_ L ^ be the empiri- 
cal distribution of 9 — L(y) based on the samples 9 — Lj(yj), i = 1, . . . , n. From Theorem 
we have that L c (x) = L(x) + £ Q / 2 for some a G (0, 1), where G , {g_i( x )}(^ Q ,/ 2 ) = a /2- 



2.1 



Let £ Q / 2 = G ,~ (a/2) be the empirical estimate of £ Q / 2 . 
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If we define L c (x) = L(x) + £ a / 2/ then for any weR, we have 
Pr(v^(Z c (x) - L c (x)) < w) = Pr(v^(| a/2 - £ a/2 ) < w) 

= Pr^G {e _ L(x)} (^ Q/2 ) < G{0-Z(x)}(£a/2 + 

= Pr(G {e _ L(x)} (e Q/2 ) < a/2 + [wG' {e _ i(x)} (^ /2 ) + o(l)]/v^ 

where the last equality follows from a first order Taylor expansion of G at £ a / 2 . 

If y represents the number of times that G(£ Q / 2 ) is smaller than ( = a/2+[wG'| e _ L ^|(£ Q ,/ 2 )+ 
o{l)]/y/n, then since G(£) ~ Z7 (0, 1), we have 1" ~ Binomial(n, Q. Hence 

Y ~ U " AiU.l) (6) 



in distribution as n — > oo dderVaart2'0 00). 



Let r n be the integer rank of the a /2-th quantile from a data set X = {X±, . . . , X n } of 
length n, such that £ a / 2 = Xr r \. If we assume that "^f^ -> as n — >• oo, then from wot 
we have 

Pr(v / n(W 2 - &/ 2 ) < w) = Pr(G {e _ L(x)} (4 /2 ) < c) = Pr(V > r n ) 

= p r | y ~ nC > Tn ~ ™^ ^ 



( na/2 + V^Gj fl _ i(x)} (£ Q / 2 ) - r„ \ 

= / n(q/2) - r n ^' {g _ L(x)} (g a/2 ) \ 

^n(«/2)(l-a/2) y/(a/2)(l - a/2) J M 

VV(«/2)(l-a/2); 
It then follows that the consistency of the estimator L c can be established as 



lim Pr(Vn(L c (x) - L c (x)) > e) < lim ^(^(x) - L c (x))] = Q 



by the Markov inequality ( |Ash and Doleans-Dade 2 000). 
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